智能论文笔记

Greykite: Deploying Flexible Forecasting at Scale at LinkedIn

Reza Hosseini , Albert Chen , Kaixu Yang , Sayan Patra , Yi Su , Saad Eddin Al Orjany , Sishi Tang , Parvez Ahammad

分类：机器学习

2022-07-15

预测可帮助企业分配资源并实现目标。在LinkedIn，产品所有者使用预测来设定业务目标，跟踪前景和监视健康。工程师使用预测有效地提供硬件。开发一种预测解决方案来满足这些需求，需要对各种时间序列进行准确，可解释的预测，并以次数至季度的频率。我们提出了Greykite，这是一个用于预测的开源Python库，已在LinkedIn上部署了二十多种用例。它的旗舰算法Silverkite提供了可解释的，快速且高度灵活的单变量预测，可捕获诸如时期增长和季节性，自相关，假期和回归剂等效果。该库通过促进数据探索，模型配置，执行和解释来实现自我服务的准确性和信任。我们的基准结果显示了来自各个域的数据集的现成速度和准确性。在过去的两年中，金融，工程和产品团队的资源计划和分配，目标设置和进度跟踪，异常检测和根本原因分析的资源团队一直信任灰金矿的预测。我们希望灰金矿对具有类似应用的预测从业者有用，这些应用需要准确，可解释的预测，这些预测捕获了与人类活动相关的时间序列共有的复杂动力学。

translated by 谷歌翻译

Statistical Machine Translation for Indic Languages

Sudhansu Bala Das , Divyajoti Panda , Tapas Kumar Mishra , Bidyut Kr. Patra

分类：自然语言处理

2023-01-02

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

translated by 谷歌翻译

A Length-Extrapolatable Transformer

Yutao Sun , Li Dong , Barun Patra , Shuming Ma , Shaohan Huang , Alon Benhaim , Vishrav Chaudhary , Xia Song , Furu Wei

分类：自然语言处理

2022-12-20

Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Specifically, we introduce a relative position embedding to explicitly maximize attention resolution. Moreover, we use blockwise causal attention during inference for better resolution. We evaluate different Transformer variants with language modeling. Experimental results show that our model achieves strong performance in both interpolation and extrapolation settings. The code will be available at https://aka.ms/LeX-Transformer.

translated by 谷歌翻译

Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning

Ramya Hebbalaguppe , Rishabh Patra , Tirtharaj Dash , Gautam Shroff , Lovekesh Vig

分类：机器学习 | 计算机视觉

2022-12-20

Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of overconfident predictions by pushing down the confidence of the winning class while increasing the confidence of the remaining classes across all test samples. However, from a deployment perspective, an ideal model is desired to (i) generate well-calibrated predictions for high-confidence samples with predicted probability say >0.95, and (ii) generate a higher proportion of legitimate high-confidence samples. To this end, we propose a novel regularization technique that can be used with classification losses, leading to state-of-the-art calibrated predictions at test time; From a deployment standpoint in safety-critical applications, only high-confidence samples from a well-calibrated model are of interest, as the remaining samples have to undergo manual inspection. Predictive confidence reduction of these potentially ``high-confidence samples'' is a downside of existing calibration approaches. We mitigate this by proposing a dynamic train-time data pruning strategy that prunes low-confidence samples every few epochs, providing an increase in "confident yet calibrated samples". We demonstrate state-of-the-art calibration performance across image classification benchmarks, reducing training time without much compromise in accuracy. We provide insights into why our dynamic pruning strategy that prunes low-confidence training samples leads to an increase in high-confidence samples at test time.

translated by 谷歌翻译

LaSQuE: Improved Zero-Shot Classification from Explanations Through Quantifier Modeling and Curriculum Learning

Sayan Ghosh , Rakesh R Menon , Shashank Srivastava

分类：自然语言处理

2022-12-18

A hallmark of human intelligence is the ability to learn new concepts purely from language. Several recent approaches have explored training machine learning models via natural language supervision. However, these approaches fall short in leveraging linguistic quantifiers (such as 'always' or 'rarely') and mimicking humans in compositionally learning complex tasks. Here, we present LaSQuE, a method that can learn zero-shot classifiers from language explanations by using three new strategies - (1) modeling the semantics of linguistic quantifiers in explanations (including exploiting ordinal strength relationships, such as 'always' > 'likely'), (2) aggregating information from multiple explanations using an attention-based mechanism, and (3) model training via curriculum learning. With these strategies, LaSQuE outperforms prior work, showing an absolute gain of up to 7% in generalizing to unseen real-world classification tasks.

translated by 谷歌翻译

SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies

Zehao Yu , Xi Yang , Chong Dang , Prakash Adekkanattu , Braja Gopal Patra , Yifan Peng , Jyotishman Pathak , Debbie L. Wilson , Ching-Yuan Chang , Wei-Hsuan Lo-Ciganic

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-06

Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH using cancer populations. Methods: We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort. We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance. We applied the best NLP model to extract 19 categories of SDoH from the breast (n=7,971), lung (n=11,804), and colorectal cancer (n=6,240) cohorts. Results and Conclusion: We developed a corpus of 629 cancer patients notes with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH. The Bidirectional Encoder Representations from Transformers (BERT) model achieved the best strict/lenient F1 scores of 0.9216 and 0.9441 for SDoH concept extraction, 0.9617 and 0.9626 for linking attributes to SDoH concepts. Fine-tuning the NLP models using new annotations from opioid use patients improved the strict/lenient F1 scores from 0.8172/0.8502 to 0.8312/0.8679. The extraction rates among 19 categories of SDoH varied greatly, where 10 SDoH could be extracted from >70% of cancer patients, but 9 SDoH had a low extraction rate (<70% of cancer patients). The SODA package with pre-trained transformer models is publicly available at https://github.com/uf-hobiinformatics-lab/SDoH_SODA.

translated by 谷歌翻译

Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems

Abishek Thangamuthu , Gunjan Kumar , Suresh Bishnoi , Ravinder Bhattoo , N M Anoop Krishnan , Sayan Ranu

分类：机器学习

2022-11-10

Recently, graph neural networks have been gaining a lot of attention to simulate dynamical systems due to their inductive nature leading to zero-shot generalizability. Similarly, physics-informed inductive biases in deep-learning frameworks have been shown to give superior performance in learning the dynamics of physical systems. There is a growing volume of literature that attempts to combine these two approaches. Here, we evaluate the performance of thirteen different graph neural networks, namely, Hamiltonian and Lagrangian graph neural networks, graph neural ODE, and their variants with explicit constraints and different architectures. We briefly explain the theoretical formulation highlighting the similarities and differences in the inductive biases and graph architecture of these systems. We evaluate these models on spring, pendulum, gravitational, and 3D deformable solid systems to compare the performance in terms of rollout error, conserved quantities such as energy and momentum, and generalizability to unseen system sizes. Our study demonstrates that GNNs with additional inductive biases, such as explicit constraints and decoupling of kinetic and potential energies, exhibit significantly enhanced performance. Further, all the physics-informed GNNs exhibit zero-shot generalizability to system sizes an order of magnitude larger than the training system, thus providing a promising route to simulate large-scale realistic systems.

translated by 谷歌翻译

Global Counterfactual Explainer for Graph Neural Networks

Mert Kosan , Zexi Huang , Sourav Medya , Sayan Ranu , Ambuj Singh

分类：机器学习

2022-10-21

Graph neural networks (GNNs) find applications in various domains such as computational biology, natural language processing, and computer security. Owing to their popularity, there is an increasing need to explain GNN predictions since GNNs are black-box machine learning models. One way to address this is counterfactual reasoning where the objective is to change the GNN prediction by minimal changes in the input graph. Existing methods for counterfactual explanation of GNNs are limited to instance-specific local reasoning. This approach has two major limitations of not being able to offer global recourse policies and overloading human cognitive ability with too much information. In this work, we study the global explainability of GNNs through global counterfactual reasoning. Specifically, we want to find a small set of representative counterfactual graphs that explains all input graphs. Towards this goal, we propose GCFExplainer, a novel algorithm powered by vertex-reinforced random walks on an edit map of graphs with a greedy summary. Extensive experiments on real graph datasets show that the global explanation from GCFExplainer provides important high-level insights of the model behavior and achieves a 46.9% gain in recourse coverage and a 9.5% reduction in recourse cost compared to the state-of-the-art local counterfactual explainers.

translated by 谷歌翻译

Improving Multilingual Neural Machine Translation System for Indic Languages

Sudhansu Bala Das , Atharv Biradar , Tapas Kumar Mishra , Bidyut Kumar Patra

分类：自然语言处理

2022-09-27

机器翻译系统（MTS）是通过将文本或语音从一种语言转换为另一种语言的有效工具。在像印度这样的大型多语言环境中，对有效的翻译系统的需求变得显而易见，英语和一套印度语言（ILS）正式使用。与英语相反，由于语料库的不可用，IL仍然被视为低资源语言。为了解决不对称性质，多语言神经机器翻译（MNMT）系统会发展为在这个方向上的理想方法。在本文中，我们提出了一个MNMT系统，以解决与低资源语言翻译有关的问题。我们的模型包括两个MNMT系统，即用于英语印度（一对多），另一个用于指示英语（多一对多），其中包含15个语言对（30个翻译说明）的共享编码器码头。由于大多数IL对具有很少的平行语料库，因此不足以训练任何机器翻译模型。我们探索各种增强策略，以通过建议的模型提高整体翻译质量。最先进的变压器体系结构用于实现所提出的模型。大量数据的试验揭示了其优越性比常规模型的优势。此外，本文解决了语言关系的使用（在方言，脚本等方面），尤其是关于同一家族的高资源语言在提高低资源语言表现方面的作用。此外，实验结果还表明了ILS的倒退和域适应性的优势，以提高源和目标语言的翻译质量。使用所有这些关键方法，我们提出的模型在评估指标方面比基线模型更有效，即一组ILS的BLEU（双语评估研究）得分。

translated by 谷歌翻译

Learning Rigid Body Dynamics with Lagrangian Graph Neural Network

Ravinder Bhattoo , Sayan Ranu , N. M. Anoop Krishnan

分类：机器学习

2022-09-23

Lagrangian和Hamiltonian神经网络（分别是LNN和HNN）编码强诱导偏见，使它们能够显着优于其他物理系统模型。但是，到目前为止，这些模型大多仅限于简单的系统，例如摆和弹簧或单个刚体的身体，例如陀螺仪或刚性转子。在这里，我们提出了一个拉格朗日图神经网络（LGNN），可以通过利用其拓扑来学习刚体的动态。我们通过学习以刚体为刚体的棒的绳索，链条和桁架的动力学来证明LGNN的性能。 LGNN还表现出普遍性 - 在链条上训练了一些细分市场的LGNN具有概括性，以模拟具有大量链接和任意链路长度的链条。我们还表明，LGNN可以模拟看不见的混合动力系统，包括尚未接受过培训的酒吧和链条。具体而言，我们表明LGNN可用于建模复杂的现实世界结构的动力学，例如紧张结构的稳定性。最后，我们讨论了质量矩阵的非对角性性质及其在复杂系统中概括的能力。

translated by 谷歌翻译